My goal with this analysis is to find out whether Brazilian deputies have been using their reimbursement rights inadequately.
These reimbursements are called Quota for the Exercise of Parliamentary Activity and are “[a quota] destined to pay for expenses exclusively linked to the execise of parliamentary activity”. Therefore, as far as I could tell, there are two main ways we could detect an improper claim for reimbursement (given the available data): * If the refund category is suspicious * If the time component of the refund is suspicious
To investigate suspicious refund categories I tried plotting one box-plot for each. Much to my surprise, I had already found something very weird.
# Plot refund descriptions
deputies %>%
plot_ly(
x = ~refund_description,
y = ~refund_value,
type = "box",
color = ~refund_description
) %>%
layout(
legend = list(orientation = 'h'),
xaxis = list(
showticklabels = FALSE,
title = ""
)
)
It seems that “dissemination of parliamentary activity” has a lot more outliers than the other categories. Since this refund description is very vague, it seems to me that it is being widely used by the deputies as a cover up for improper refunds.
If we look into it a little more, we find that the top outlier of the whole plot is in this category and corresponds to R$184,500.00 reimbursed for expenses at a small print shop. This category is also the one that has had the highest overall cost for the taxpayer: a total of R$48,645,429.54.
# Plot total value of refunds
desc_summ %>%
plot_ly(
x = ~refund_description,
y = ~refund_tot,
type = "bar",
color = ~refund_description
) %>%
layout(
legend = list(orientation = 'h'),
xaxis = list(
showticklabels = FALSE,
title = ""
)
)